Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
نویسندگان
چکیده
We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented.
منابع مشابه
Supplementary material for : Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
The function k : Ω×Ω→ R is called a kernel on Ω if the matrix (k(xi, xj))1≤i,j≤n is positive semidefinite for all positive integers n and all x1, . . . , xn ∈ Ω. It is well-known that if k is a kernel on Ω, then there exists a Hilbert space H̃ and Φ̃ : Ω→ H̃ such that k(x, x′) = 〈Φ̃(x), Φ̃(x)〉H̃. While H̃ and Φ̃ are not uniquely determined by k, the Hilbert space of functionsHk = {〈v, Φ̃(·)〉H̃ : v ∈ H̃} i...
متن کاملSample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملOn Deep Representation Learning from Noisy Web Images
The keep-growing content of Web images may be the next important data source to scale up deep neural networks, which recently obtained a great success in the ImageNet classification challenge and related tasks. This prospect, however, has not been validated on convolutional networks (convnet) – one of best performing deep models – because of their supervised regime. While unsupervised alternati...
متن کامل